This Notebook will go through multiple models (KNN, Logistic Regression, Decision Trees, Support Vector Machines and Random Forest) to assess the best one.



In [3]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline



In [5]:

    
df = pd.read_csv("car.csv")
df.head()









    Out[5]:






  
    
      
      buying
      maint
      doors
      persons
      lug_boot
      safety
      acceptability
    
  
  
    
      0
      vhigh
      vhigh
      2
      2
      small
      low
      unacc
    
    
      1
      vhigh
      vhigh
      2
      2
      small
      med
      unacc
    
    
      2
      vhigh
      vhigh
      2
      2
      small
      high
      unacc
    
    
      3
      vhigh
      vhigh
      2
      2
      med
      low
      unacc
    
    
      4
      vhigh
      vhigh
      2
      2
      med
      med
      unacc

Checking the unique values for each of the columns.



In [6]:

    
print df.buying.unique()

print df.maint.unique()

print df.doors.unique()

print df.persons.unique()

print df.lug_boot.unique()

print df.safety.unique()

print df.acceptability.unique()









    



['vhigh' 'high' 'med' 'low']
['vhigh' 'high' 'med' 'low']
['2' '3' '4' '5more']
['2' '4' 'more']
['small' 'med' 'big']
['low' 'med' 'high']
['unacc' 'acc' 'vgood' 'good']

Using the information in the cell above, maps will be used to create a scale.



In [7]:

    
map1 = {'low':1,
        'med':2,
        'high':3,
        'vhigh':4}

map2 = {'small':1,
        'med':2,
        'big':3}

map3 = {'unacc':1,
        'acc':2,
        'good':3,
        'vgood':4}

map4 = {'2': 2,
        '4': 4,
        'more': 5}

map5 = {'2': 2,
        '3': 3,
        '4': 4,
        '5more': 5}

Splitting up the needed features from my target which is acceptability.



In [9]:

    
features = [c for c in df.columns if c != 'acceptability']
#removing 'acceptability'

df1 = df.copy()

df1.buying= df.buying.map(map1)

df1.maint= df.maint.map(map1)

df1.doors = df.doors.map(map5)

df1.persons = df.persons.map(map4)

df1.lug_boot = df.lug_boot.map(map2)

df1.safety = df.safety.map(map1)

df1.acceptability = df.acceptability.map(map3)

X = df1[features]
y = df1['acceptability']
X.head(10)
#making sure it worked

Train test split and creating a function to evaluate the models being created next.



In [13]:

    
from sklearn.cross_validation import train_test_split, KFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

def evaluate_model(model):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    a = accuracy_score(y_test, y_pred)
    
    cm = confusion_matrix(y_test, y_pred)
    cr = classification_report(y_test, y_pred)
    
    print cm
    print cr
    
    return a

various_models = {}









    



//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

KNN Classifier



In [14]:

    
from sklearn.neighbors import KNeighborsClassifier

a = evaluate_model(KNeighborsClassifier())









    



[[354   9   0   0]
 [  8 107   0   0]
 [  0   9  11   1]
 [  0   2   0  18]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.84      0.93      0.88       115
          3       1.00      0.52      0.69        21
          4       0.95      0.90      0.92        20

avg / total       0.95      0.94      0.94       519



In [15]:

    
from sklearn.grid_search import GridSearchCV

params = {'n_neighbors': range(2,60)}

gsknn = GridSearchCV(KNeighborsClassifier(),
                     params, n_jobs=-1,
                     cv=KFold(len(y), n_folds=3, shuffle=True))









    



//anaconda/lib/python2.7/site-packages/sklearn/grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)



In [16]:

    
gsknn.fit(X, y)









    Out[16]:





GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_neighbors': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)



In [17]:

    
gsknn.best_params_









    Out[17]:





{'n_neighbors': 5}



In [18]:

    
gsknn.best_score_









    Out[18]:





0.9537037037037037



In [19]:

    
evaluate_model(gsknn.best_estimator_)









    



[[354   9   0   0]
 [  8 107   0   0]
 [  0   9  11   1]
 [  0   2   0  18]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.84      0.93      0.88       115
          3       1.00      0.52      0.69        21
          4       0.95      0.90      0.92        20

avg / total       0.95      0.94      0.94       519







    Out[19]:





0.94412331406551064



In [20]:

    
various_models['knn'] = {'model': gsknn.best_estimator_,
                     'score': a}

Bagging KNN Classifier. Resulted in a small decrease in the score (from .944 to .940).



In [21]:

    
from sklearn.ensemble import BaggingClassifier
baggingknn = BaggingClassifier(KNeighborsClassifier())



In [22]:

    
evaluate_model(baggingknn)









    



[[351  12   0   0]
 [  6 107   2   0]
 [  0   6  14   1]
 [  0   4   0  16]]
             precision    recall  f1-score   support

          1       0.98      0.97      0.97       363
          2       0.83      0.93      0.88       115
          3       0.88      0.67      0.76        21
          4       0.94      0.80      0.86        20

avg / total       0.94      0.94      0.94       519







    Out[22]:





0.94026974951830444



In [23]:

    
bagging_params = {'n_estimators': [10, 20],
                  'max_samples': [0.7, 1.0],
                  'max_features': [0.7, 1.0],
                  'bootstrap_features': [True, False]}


gsbaggingknn = GridSearchCV(baggingknn,
                            bagging_params, n_jobs=-1,
                            cv=KFold(len(y), n_folds=3, shuffle=True))



In [24]:

    
gsbaggingknn.fit(X, y)









    



//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)






    Out[24]:





GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [10, 20], 'max_samples': [0.7, 1.0], 'bootstrap_features': [True, False], 'max_features': [0.7, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)



In [25]:

    
gsbaggingknn.best_params_









    Out[25]:





{'bootstrap_features': False,
 'max_features': 1.0,
 'max_samples': 1.0,
 'n_estimators': 20}



In [26]:

    
various_models['gsbaggingknn'] = {'model': gsbaggingknn.best_estimator_,
                     'score': evaluate_model(gsbaggingknn.best_estimator_)}









    



[[356   7   0   0]
 [  6 109   0   0]
 [  0   9  12   0]
 [  0   3   0  17]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.85      0.95      0.90       115
          3       1.00      0.57      0.73        21
          4       1.00      0.85      0.92        20

avg / total       0.96      0.95      0.95       519

Now moving onto Logistic Regression.



In [27]:

    
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
various_models['lr'] = {'model': lr,
                    'score': evaluate_model(lr)}









    



[[347  11   4   1]
 [ 59  53   3   0]
 [  5  15   1   0]
 [  0  19   0   1]]
             precision    recall  f1-score   support

          1       0.84      0.96      0.90       363
          2       0.54      0.46      0.50       115
          3       0.12      0.05      0.07        21
          4       0.50      0.05      0.09        20

avg / total       0.73      0.77      0.74       519



In [28]:

    
params = {'C': [0.001, 0.01, 0.1, 1.0, 10.0, 100.0],
          'penalty': ['l1', 'l2']}

gslr = GridSearchCV(lr,
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gslr.fit(X, y)

print gslr.best_params_
print gslr.best_score_

various_models['gslr'] = {'model': gslr.best_estimator_,
                             'score': evaluate_model(gslr.best_estimator_)}









    



{'penalty': 'l1', 'C': 10.0}
0.827546296296
[[344  14   4   1]
 [ 48  64   3   0]
 [  4  14   2   1]
 [  0   9   0  11]]
             precision    recall  f1-score   support

          1       0.87      0.95      0.91       363
          2       0.63      0.56      0.59       115
          3       0.22      0.10      0.13        21
          4       0.85      0.55      0.67        20

avg / total       0.79      0.81      0.80       519



In [48]:

    
gsbagginglr = GridSearchCV(BaggingClassifier(gslr.best_estimator_),
                           bagging_params, n_jobs=-1,
                           cv=KFold(len(y), n_folds=3, shuffle=True))

gsbagginglr.fit(X, y)

print gsbagginglr.best_params_
print gsbagginglr.best_score_

various_models['gsbagginglr'] = {'model': gsbagginglr.best_estimator_,
                             'score': evaluate_model(gsbagginglr.best_estimator_)}









    



//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)






    



{'max_features': 1.0, 'max_samples': 1.0, 'n_estimators': 20, 'bootstrap_features': False}
0.829282407407
[[344  14   4   1]
 [ 47  65   3   0]
 [  4  12   4   1]
 [  0   9   0  11]]
             precision    recall  f1-score   support

          1       0.87      0.95      0.91       363
          2       0.65      0.57      0.60       115
          3       0.36      0.19      0.25        21
          4       0.85      0.55      0.67        20

avg / total       0.80      0.82      0.80       519

Decision Trees are next.



In [ ]:

    
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier()
various_models['dt'] = {'model': dt,
                    'score': evaluate_model(dt)}



In [ ]:

    
params = {'criterion': ['gini', 'entropy'],
          'splitter': ['best', 'random'],
          'max_depth': [None, 5, 10],
          'min_samples_split': [2, 5],
          'min_samples_leaf': [1, 2, 3]}

gsdt = GridSearchCV(dt,
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gsdt.fit(X, y)
print gsdt.best_params_
print gsdt.best_score_

various_models['gsdt'] = {'model': gsdt.best_estimator_,
                      'score': evaluate_model(gsdt.best_estimator_)}



In [ ]:

    
gsbaggingdt = GridSearchCV(BaggingClassifier(gsdt.best_estimator_),
                           bagging_params, n_jobs=-1,
                           cv=KFold(len(y), n_folds=3, shuffle=True))

gsbaggingdt.fit(X, y)

print gsbaggingdt.best_params_
print gsbaggingdt.best_score_

various_models['gsbaggingdt'] = {'model': gsbaggingdt.best_estimator_,
                             'score': evaluate_model(gsbaggingdt.best_estimator_)}

On to Support Vector Machines.



In [30]:

    
from sklearn.svm import SVC

svm = SVC()
various_models['svm'] = {'model': svm,
                     'score': evaluate_model(svm)}









    



[[352  11   0   0]
 [  4 110   1   0]
 [  0   5  14   2]
 [  0   1   0  19]]
             precision    recall  f1-score   support

          1       0.99      0.97      0.98       363
          2       0.87      0.96      0.91       115
          3       0.93      0.67      0.78        21
          4       0.90      0.95      0.93        20

avg / total       0.96      0.95      0.95       519



In [31]:

    
params = {'C': [0.01, 0.1, 1.0, 10.0, 30.0, 100.0],
          'gamma': ['auto', 0.1, 1.0, 10.0],
          'kernel': ['linear', 'rbf']}


gssvm = GridSearchCV(svm,
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gssvm.fit(X, y)
print gssvm.best_params_
print gssvm.best_score_

various_models['gssvm'] = {'model': gssvm.best_estimator_,
                      'score': evaluate_model(gssvm.best_estimator_)}









    



{'kernel': 'rbf', 'C': 30.0, 'gamma': 'auto'}
0.988425925926
[[363   0   0   0]
 [  4 111   0   0]
 [  0   2  19   0]
 [  0   0   0  20]]
             precision    recall  f1-score   support

          1       0.99      1.00      0.99       363
          2       0.98      0.97      0.97       115
          3       1.00      0.90      0.95        21
          4       1.00      1.00      1.00        20

avg / total       0.99      0.99      0.99       519



In [32]:

    
gsbaggingsvm = GridSearchCV(BaggingClassifier(gssvm.best_estimator_),
                           bagging_params, n_jobs=-1,
                           cv=KFold(len(y), n_folds=3, shuffle=True))

gsbaggingsvm.fit(X, y)

print gsbaggingsvm.best_params_
print gsbaggingsvm.best_score_

various_models['gsbaggingsvm'] = {'model': gsbaggingsvm.best_estimator_,
                             'score': evaluate_model(gsbaggingsvm.best_estimator_)}









    



//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)






    



{'max_features': 1.0, 'max_samples': 1.0, 'n_estimators': 20, 'bootstrap_features': False}
0.981481481481
[[363   0   0   0]
 [  4 111   0   0]
 [  0   1  19   1]
 [  0   0   0  20]]
             precision    recall  f1-score   support

          1       0.99      1.00      0.99       363
          2       0.99      0.97      0.98       115
          3       1.00      0.90      0.95        21
          4       0.95      1.00      0.98        20

avg / total       0.99      0.99      0.99       519

Random Forests and Extra Trees are up next.



In [44]:

    
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

rf = RandomForestClassifier()
various_models['rf'] = {'model': rf,
                    'score': evaluate_model(rf)}



et = ExtraTreesClassifier()
various_models['et'] = {'model': et,
                    'score': evaluate_model(et)}









    



[[357   6   0   0]
 [  4 111   0   0]
 [  0   8  13   0]
 [  0   5   0  15]]
             precision    recall  f1-score   support

          1       0.99      0.98      0.99       363
          2       0.85      0.97      0.91       115
          3       1.00      0.62      0.76        21
          4       1.00      0.75      0.86        20

avg / total       0.96      0.96      0.95       519

[[357   5   1   0]
 [ 11 104   0   0]
 [  0   4  16   1]
 [  0   3   0  17]]
             precision    recall  f1-score   support

          1       0.97      0.98      0.98       363
          2       0.90      0.90      0.90       115
          3       0.94      0.76      0.84        21
          4       0.94      0.85      0.89        20

avg / total       0.95      0.95      0.95       519



In [45]:

    
params = {'n_estimators':[3, 5, 10, 50],
          'criterion': ['gini', 'entropy'],
          'max_depth': [None, 3, 5],
          'min_samples_split': [2,5],
          'class_weight':[None, 'balanced']}


gsrf = GridSearchCV(RandomForestClassifier(n_jobs=-1),
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gsrf.fit(X, y)
print gsrf.best_params_
print gsrf.best_score_

various_models['gsrf'] = {'model': gsrf.best_estimator_,
                      'score': evaluate_model(gsrf.best_estimator_)}









    



{'min_samples_split': 2, 'n_estimators': 50, 'criterion': 'entropy', 'max_depth': None, 'class_weight': None}
0.974537037037
[[357   6   0   0]
 [  4 109   0   2]
 [  0   4  16   1]
 [  0   1   0  19]]
             precision    recall  f1-score   support

          1       0.99      0.98      0.99       363
          2       0.91      0.95      0.93       115
          3       1.00      0.76      0.86        21
          4       0.86      0.95      0.90        20

avg / total       0.97      0.97      0.97       519



In [47]:

    
gset = GridSearchCV(ExtraTreesClassifier(n_jobs=-1),
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gset.fit(X, y)
print gset.best_params_
print gset.best_score_

various_models['gset'] = {'model': gset.best_estimator_,
                      'score': evaluate_model(gset.best_estimator_)}









    



{'min_samples_split': 5, 'n_estimators': 50, 'criterion': 'entropy', 'max_depth': None, 'class_weight': None}
0.97337962963
[[357   5   1   0]
 [  4 109   1   1]
 [  0   6  13   2]
 [  0   1   0  19]]
             precision    recall  f1-score   support

          1       0.99      0.98      0.99       363
          2       0.90      0.95      0.92       115
          3       0.87      0.62      0.72        21
          4       0.86      0.95      0.90        20

avg / total       0.96      0.96      0.96       519

Creating a dataframe to compare the models.



In [50]:

    
scores = pd.DataFrame([(k, v['score']) for k, v in various_models.iteritems()],
             columns=['model', 'score']).set_index('model').sort_values('score', ascending=False)

plt.style.use('fivethirtyeight')
scores.plot(kind='bar')
plt.ylim(0.5, 1.05)

scores









    Out[50]:






  
    
      
      score
    
    
      model
      
    
  
  
    
      gsbaggingsvm
      0.988439
    
    
      gssvm
      0.988439
    
    
      gsrf
      0.965318
    
    
      gset
      0.959538
    
    
      rf
      0.955684
    
    
      svm
      0.953757
    
    
      et
      0.951830
    
    
      gsbaggingknn
      0.951830
    
    
      knn
      0.944123
    
    
      gsbagginglr
      0.816956
    
    
      gslr
      0.811175
    
    
      lr
      0.774566

Both gridsearch bagging SVM and gridsearch SVM were identical in the above modeling process.



In [51]:

    
#Repeating the tests on my various models
from sklearn.cross_validation import cross_val_score, StratifiedKFold

def retest(model):
    scores = cross_val_score(model, X, y,
                             cv=StratifiedKFold(y, shuffle=True),
                             n_jobs=-1)
    m = scores.mean()
    s = scores.std()
    
    return m, s

for k, v in various_models.iteritems():
    cvres = retest(v['model'])
    print k, 
    various_models[k]['cvres'] = cvres









    



knn





    



//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)






    



 gsbagginglr gsrf svm et





    



//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)






    



 gsbaggingsvm gslr rf lr gset gssvm





    



//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)






    



 gsbaggingknn



In [52]:

    
cvscores = pd.DataFrame([(k, v['cvres'][0], v['cvres'][1] ) for k, v in various_models.iteritems()],
                        columns=['model', 'score', 'error']).set_index('model').sort_values('score', ascending=False)



fig, ax = plt.subplots()
rects1 = ax.bar(range(len(cvscores)), cvscores.score,
                yerr=cvscores.error,
                tick_label=cvscores.index)

plt.style.use('fivethirtyeight')
ax.set_ylabel('Scores')
plt.xticks(rotation=70)
plt.ylim(0.5, 1.05)

cvscores









    Out[52]:






  
    
      
      score
      error
    
    
      model
      
      
    
  
  
    
      gsbaggingsvm
      0.983221
      0.002931
    
    
      gssvm
      0.982070
      0.007246
    
    
      gsrf
      0.976858
      0.007782
    
    
      gset
      0.970486
      0.002456
    
    
      rf
      0.967591
      0.004343
    
    
      gsbaggingknn
      0.962387
      0.004952
    
    
      svm
      0.955445
      0.003505
    
    
      et
      0.954871
      0.007436
    
    
      knn
      0.947919
      0.002797
    
    
      gslr
      0.831575
      0.016026
    
    
      gsbagginglr
      0.827558
      0.010062
    
    
      lr
      0.804410
      0.010016

The top 7 listed above were very close to each other in their scores; with Support Vector Machines with gridsearch and bagging besting gridsearch SVM by a miniscule amount, .983 to .982.

This lab was extensive with all the different models used; it was extremely helpful in giving me a deeper understanding of the modeling process. For this reason it is one of my favorite labs from our course.

	buying	maint	doors	persons	lug_boot	safety	acceptability
0	vhigh	vhigh	2	2	small	low	unacc
1	vhigh	vhigh	2	2	small	med	unacc
2	vhigh	vhigh	2	2	small	high	unacc
3	vhigh	vhigh	2	2	med	low	unacc
4	vhigh	vhigh	2	2	med	med	unacc

	buying	maint	doors	persons	lug_boot	safety
0	4	4	2	2	1	1
1	4	4	2	2	1	2
2	4	4	2	2	1	3
3	4	4	2	2	2	1
4	4	4	2	2	2	2
5	4	4	2	2	2	3
6	4	4	2	2	3	1
7	4	4	2	2	3	2
8	4	4	2	2	3	3
9	4	4	2	4	1	1

	score
model
gsbaggingsvm	0.988439
gssvm	0.988439
gsrf	0.965318
gset	0.959538
rf	0.955684
svm	0.953757
et	0.951830
gsbaggingknn	0.951830
knn	0.944123
gsbagginglr	0.816956
gslr	0.811175
lr	0.774566

	score	error
model
gsbaggingsvm	0.983221	0.002931
gssvm	0.982070	0.007246
gsrf	0.976858	0.007782
gset	0.970486	0.002456
rf	0.967591	0.004343
gsbaggingknn	0.962387	0.004952
svm	0.955445	0.003505
et	0.954871	0.007436
knn	0.947919	0.002797
gslr	0.831575	0.016026
gsbagginglr	0.827558	0.010062
lr	0.804410	0.010016

	buying	maint	doors	persons	lug_boot	safety
0	4	4	2	2	1	1
1	4	4	2	2	1	2
2	4	4	2	2	1	3
3	4	4	2	2	2	1
4	4	4	2	2	2	2
5	4	4	2	2	2	3
6	4	4	2	2	3	1
7	4	4	2	2	3	2
8	4	4	2	2	3	3
9	4	4	2	4	1	1

	buying	maint	doors	persons	lug_boot	safety
0	4	4	2	2	1	1
1	4	4	2	2	1	2
2	4	4	2	2	1	3
3	4	4	2	2	2	1
4	4	4	2	2	2	2
5	4	4	2	2	2	3
6	4	4	2	2	3	1
7	4	4	2	2	3	2
8	4	4	2	2	3	3
9	4	4	2	4	1	1